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£SJ ■ Abstract. This paper focuses on the restart strategy of CMA-ES on 

multi-modal functions. A first alternative strategy proceeds by decreasing 
the initial step-size of the mutation while doubling the population size at 
each restart. A second strategy adaptively allocates the computational 
budget among the restart settings in the BIPOP scheme. Both restart 
strategies are validated on the BBOB benchmark; their generality is 
also demonstrated on an independent real-world problem suite related 
to spacecraft trajectory optimization. 

^ ' 1 Introduction 

The long tradition of performance of the Covariance Matrix Adaptation Evolu- 
tion Strategy (CMA-ES) algorithm on real-world problems (with over 100 pub- 
| lished applications [6] ) is due among others to its good behavior on multi-modal 

. functions. Two versions of CMA-ES with restarts have been proposed to handle 

multi-modal functions: IPOP-CMA-ES [2] was ranked first on the continuous 
optimization benchmark at CEC 2005 [4,3]; and BIPOP-CMA-ES [5] showed 
1 the best results together with IPOP-CMA-ES on the black-box optimization 

CN benchmark (BBOB) in 2009 and 2010. 

This paper focuses on analyzing and improving the restart strategy of CMA- 
ES, viewed as a noisy hyper-parameter optimization problem in a 2D space (pop- 
ulation size, initial step-size). Two restart strategies are defined. The first one, 
NIPOP-aCMA-ES (New IPOP-aCMA-ES), differs from IPOP-CMA-ES as it si- 
multaneously increases the population size and decreases the step size. The sec- 
ond one, NBIPOP-aCMA-ES, allocates computational power to different restart 
settings depending on their current results. While these strategies have been 
designed with the BBOB benchmarks in mind [8], their generality is shown on 
a suite of real- world problems [16]. 

The paper is organized as follows. After describing the weighted active (n/[x w , A)- 
CMA-ES and its current restart strategies (section 2), the proposed restart 
schemes are described in section 3. Section 4 reports on their experimental vali- 
dation. The paper concludes with a discussion and some perspectives for further 
research. 
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2 The Weighted Active (n/n w , A)-CMA-ES 



The CMA-ES algorithm is a stochastic optimizer, searching the continuous space 
W D by sampling A candidate solutions from a multivariate normal distribution 
[10,9]. It exploits the best fi solutions out of the A ones to adaptively estimate 
the local covariance matrix of the objective function, in order to increase the 
probability of successful samples in the next iteration. The information about 
the remaining (worst A — fi) solutions is used only implicitly during the selection 
process. 

In active (fi/ni, A)-CMA-ES however, it has been shown that the worst so- 
lutions can be exploited to reduce the variance of the mutation distribution in 
unpromising directions [12], yielding a performance gain of a factor 2 for the 
active (n/ni, A)-CMA-ES with no loss of performance on any of tested func- 
tions. A recent extension of the (fj,/fi w , A)-CMA-ES, weighted active CMA-ES 
[11] (referred to as aCMA-ES for brevity) shows comparable improvements on 
a set of noiseless and noisy functions from the BBOB benchmark suite [7]. In 
counterpart, aCMA-ES no longer guarantees the covariance matrix to be posi- 
tive definite, possibly resulting in algorithmic instability. The instability issues 
can however be numerically controlled during the search; as a matter of fact they 
are never observed on the BBOB benchmark suite. 

At iteration t, (/j,/ij, w , A)-CMA-ES samples A individuals according to 

4 t+1 W(m«y*) 2 CW), fc = l...A, (1) 

where J\f(m, C) denotes a normally distributed random vector with mean 
m and covariance matrix C. 

These A individuals are evaluated and ranked, where index i : A denotes the i- 
th best individual after the objective function. The mean of the distribution is up- 
dated and set to the weighted sum of the best fi individuals (m = J2i=i w i x i*\i 
with W{ > for i = 1 . . . fj, and X)f=i w i = !)• 

The active CMA-ES only differs from the original CMA-ES in the adaptation 
of the covariance matrix C^K Like for CMA-ES, the covariance matrix is com- 
puted from the best fi solutions, C+ = YJt=i w t x ' :A CT T m ' x O^-mT _ The main 
novelty is to exploit the worst solutions to compute C~ = Yli=o w i+tV ' x-i-.xv'x-i-xi 
where y x = H n ^ +1+ '' A —jH x XA ~' ;A t ~" ■ The covariance matrix es- 

j| C t - 1/2 {x X -i:\-m t )\\ <? 

timation of these worst solutions is used to decrease the variance of the mutation 
distribution along these directions: 

C t+1 = {l-c 1 - C ,+c-a- old )C t + 

+cip*+ V+ lT + (c„ + c (1 - a- M )) C+-C-C-, 

where is adapted along the evolution path and coefficients ci, c^, c _ 

and a~ ld are defined such that ci + c p — c~a~ ld < 1. The interested reader is 
referred to [10, 11] for a more detailed description of these algorithms. 



As mentioned, CMA-ES has been extended with restart strategies to ac- 
commodate multi-modal fitness landscapes, and to specifically handle objective 
functions with many local optima. As observed by [9] , the probability of reaching 
the optimum (and the overall number of function evaluations needed to do so) 
is very sensitive to the population size. The default population size Xdefauit has 
been tuned for uni-modal functions; it is hardly large enough for multi-modal 
functions. Accordingly, [2] proposed a "doubling trick" restart strategy to en- 
force global search: the restart (fj,/fj, w , A)-CMA-ES with increasing population, 
called IPOP-CMA-ES, is a multi-restart strategy where the population size of 
the run is doubled in each restart until meeting a stopping criterion. 

The BIPOP-CMA-ES instead considers two restart regimes. The first one, 
which corresponds to IPOP-CMA-ES, doubles the population size \i arg e = 
2 lrestart Xdefauit in each restart i re start and uses a fixed initial step-size of e = 

^default ' 

The second regime uses a small population size X sma U ar >d initial step-size <J® mall , 
which are randomly drawn in each restart as: 



A,. 



Xdefauit (2 xJefa^lt) 



U[0,1] 2 



small 



a default X 10 



-2C7[0,1] 



(3) 



where U[0, 1] stands for the uniform distribution in [0, 1]. Population size X sma ii 
thus varies G [Xdefauit, Xi arge /2}. BIPOP-CMA-ES launches the first run with 
default population size and initial step-size. In each restart, it selects the restart 
regime with less function evaluations. Clearly, the second regime consumes less 
function evaluations than the doubling regime; it is therefore launched more 
often. 



3 Alternative Restart Strategies 
3.1 Preliminary Analysis 

The restart strategies of IPOP- and BIPOP-CMA-ES are viewed as a search in 
the hyper-parameter space. 

IPOP-CMA-ES only aims at adjusting population size A. It is motivated by 
the results observed on multi- modal problems [9] , suggesting that the population 
size must be sufficiently large to handle problems with global structure. In such 
large population size is needed to uncover this global structure and to 
lead the algorithm to discover the global optimum. IPOP-CMA-ES thus increases 
the population size in each restart, irrespective of the results observed so far; at 
each restart, it launches a new CMA-ES with population size A = p % { r l^ tart Xdefauit 
(see o on Fig. 1). Factor pi nc must be not too large to avoid "overjumping" 
some possibly optimal population size A* ; it must also be not too small in order 
to reach A* in a reasonable number of restarts. The use of the doubling trick 
{pine — 2) guarantees that the loss in terms of function evaluations (compared 
to the "oracle" restart strategy which would directly set the population size to 
the optimal value A*) is about a factor of 2. 
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Fig. 1. Restart performances in the 2D hyper-parameter space (population size and 
initial mutation step size in log. coordinates). For each objective function (20 dimen- 
sional Rastrigin - top-left, Gallagher 21 peaks - top-right, Katsuuras - bottom-left and 
Lunacek bi-Rastrigin bottom-right), the median best function value out of 15 runs is 
indicated. Legends indicate that the optimum up to precision f(x) — 10 -10 is found 
always (+), sometimes (©) or never (o). Black regions are better than white ones. 



On the Rastrigin 20-D function, IPOP-CMA-ES performs well and always 
finds the optimum after about 5 restarts (Fig. 1, top- left). The Rastrigin func- 
tion displays indeed a global structure where the optimum is the minimizer of 
this structure. For such functions, IPOP-CMA-ES certainly is the method of 
choice. For some other functions such as the Gallagher function, there is no such 
global structure; increasing the population size does not improve the results. On 
Katsuuras and Lunacek bi-Rastrigin functions, the optimum can only be found 
with small initial step-size (lesser than the default one); this explains why it can 
be solved by BIPOP-CMA-ES, sampling the two-dimensional (A, a) space. 

Actually, the optimization of a multi-modal function by CMA-ES with restarts 
can be viewed as the optimization of the function h(9), which returns the op- 
timum found by CMA-ES defined by the hyper-parameters 0=(\,a). Function 
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Fig. 2. An illustration of A and a hyper-parameters distribution for 9 restarts of IPOP- 
aCMA-ES (o), BIPOP-aCMA-ES (o and • for 10 runs), NIPOP-aCMA-ES (□) and 
NBIPOP-aCMA-ES (□ and many A for X/\ default = 1, a/a default £ [1(T 2 , 10°]). The 
first run of all algorithms corresponds to the point with \/\d e fault = 1, o /<Tde fault = 1- 

h{9), graphically depicted in Fig. 1 can be viewed as a black box, computation- 
ally expensive and stochastic function (reflecting the stochasticity of CMA-ES). 
Both IPOP-CMA-ES and BIPOP-CMA-ES are based on implicit assumptions 
about the h(8): IPOP-CMA-ES achieves a deterministic uni-dimensional trajec- 
tory, and BIPOP-CMA-ES randomly samples the 2-dimensional search space. 

Function h{9) also can be viewed as a multi-objective fitness, since in addition 
to the solution found by CMA-ES, h(Q) could return the number of function 
evaluations needed to find that solution. h(6) could also return the computational 
effort SP1 (i.e. the average number of function evaluations of all successful runs, 
divided by proportion of successful runs). However, SP1 can only be known for 
benchmark problems where the optimum is known; as the empirical optimum is 
used in lieu of true optimum, SP1 can only be computed a posteriori. 

3.2 Algorithm 

Two new restart strategies for CMA-ES, respectively referred to as NIPOP- 
aCMA-ES and NBIPOP-aCMA-ES, are presented in this paper. 

If the restart strategy is restricted to the case of increasing of population size 
(IPOP), we propose to use NIPOP-aCMA-ES, where we additionally decrease 
the initial step-size by some factor p a dec- The rationale behind this approach 
is that the CMA-ES with relatively small initial step-size is able to explore 
small basins of attraction (see Katsuuras and Lunacek bi-Rastrigin functions on 
Fig. 1), while with initially large step-size and population size it will neglect 
the local structure of the function, but converge to the minimizer of the global 
structure. Moreover, initially, relatively small step-size will quickly increase if it 
makes sense, and this will allow the algorithm to recover the same global search 
properties than with initially large step-size (see Rastrigin function on Fig. 1). 



NIPOP-CMA-ES thus explores the two-dimensional hyper-parameter space 
in a deterministic way (see □ symbols on Fig. 2). For Pa-dec = 1.6 used in this 
study, NIPOP-CMA-ES thus reaches the lower bound (a = 10~ 2 a default) used 
by BIPOP-CMA-ES after 9 restarts, expectedly reaching the same performance 
as BIPOP-CMA-ES albeit it uses only a large population. 

The second restart strategy, NBIPOP-aCMA-ES, addresses the case where 
the probability to find the global optimum does not much vary in the (A, a) space. 
Under this assumption, it makes sense to have many restarts for a fixed budget 
(number of function evaluations). Specifically, NBIPOP-aCMA-ES implements 
the competition of the NIPOP-aCMA-ES strategy (increasing A and decreasing 
initial a in each restart) and a uniform sampling of the o space, where A is set to 
^default and er° = <J° default x The selection between the two (NIPOP- 

aCMA-ES and the uniform sampling) depends on the allowed budget like in 
NBIPOP-aCMA-ES. The difference is that NBIPOP-aCMA-ES adaptively sets 
the budget allowed to each restart strategy, where the restart strategy leading 
to the overall best solution found so far is allowed twice (pbudget = 2) a budget 
compared to the other strategy. 

4 Experimental Validation 

The experimental validation of NIPOP-aCMA-ES and NBIPOP-aCMA-ES in- 
vestigates the performance of the approach comparatively to IPOP-aCMA-ES 
and BIPOP-aCMA-ES on BBOB noiseless problems and one black-box real- 
world problem related to spacecraft trajectory optimization. The default param- 
eters of CMA-ES [11,5] are used. This section also presents the first experimental 
study of BIPOP-aCMA-ES 3 , the active version of BIPOP-CMA-ES [5]. 

4.1 Benchmarking with BBOB Framework 

The BBOB framework [7] is made of 24 noiseless and 30 noisy functions [8]. Only 
the noiseless case has been considered here. Furthermore, only the 12 multi- 
modal functions among these 24 noiseless functions are of interest for this study, 
as CMA-ES can solve the 12 other functions without any restart. 

With same experimental methodology as in [7] , the results obtained on these 
benchmark functions are presented in Fig. 4 and Table 1. The results are given 
for dimension 40, because the differences are larger in higher dimensions. The 
expected running time (ERT), used in the figures and table, depends on a 
given target function value, / t = / op t + Af. It is computed over all relevant trials 
as the number of function evaluations required in order to reach / t , summed over 
all 15 trials, and divided by the number of trials that actually reached ft [7]. 

NIPOP-aCMA-ES. On 6 out of 12 test functions (/i 5 ,/i6,/i7,/i8,/23,/ 2 4) 
NIPOP-aCMA-ES obtains the best known results for BBOB-2009 and BBOB- 
2010 workshops. On / 2 3 Katsuuras and / 2 4 Lunacek bi-Rastrigin, NIPOP-aCMA- 
ES has a speedup of a factor from 2 to 3, as could have been expected. It performs 

3 For the sake of reproducibility, the source code for NIPOP-aCMA-ES and NBIPOP- 
aCMA-ES is available at https://sites.google.com/site/ppsnbipop/ 



unexpectedly well on /i 6 Weierstrass functions, 7 times faster than IPOP-aCMA- 
ES and almost 3 times faster than BIPOP-aCMA-ES. Overall, according to Fig. 
4, NIPOP-aCMA-ES performs as well as BIPOP-aCMA-ES, while restricted to 
only one regime of increasing population size. 

NBIPOP-aCMA-ES. Thanks to the hrst regime of increasing population 
size, NBIPOP-aCMA-ES inherits some results of NIPOP-aCMA-ES. However, 
on functions where the population size does not play any important role, it 
performs significantly better than BIPOP-aCMA-ES. This is the case for f 2 i 
Gallagher 101 peaks and f 2 2 Gallagher 21 peaks functions, where NBIPOP- 
aCMA-ES has a speedup of a factor of 6. It seems that the adaptive choice 
between two regimes works efficiently on all functions except on fi$ Weierstrass. 
In this last case, NBIPOP-aCMA-ES mistakingly prefers small populations, with 
a loss factor 4 compared to NIPOP-aCMA-ES. According to Fig. 4, NBIPOP- 
aCMA-ES performs better than BIPOP-aCMA-ES on weakly structured multi- 
modal functions, showing overall best results for BBOB-2009 and BBOB-2010 
workshops in dimensions 20 (results not shown here) and 40. 

Due to space limitations, the interested reader is referred to [13] for a detailed 
presentation of the results. 

4.2 Interplanetary Trajectory Optimization 

The NIPOP-aCMA-ES and NBIPOP-aCMA-ES strategies, designed for the BBOB 
benchmark functions, can possibly overfit this benchmark suite. In order to test 
the generality of these strategies, a real-world black-box problem is considered, 
pertaining to a completely different domain: Advanced Concepts Team of Eu- 
ropean Space Agency is making available several difficult spacecraft trajectory 
optimization problems as black box functions to invite the operational research 
community to compare different derivative-free solvers on these test problems 
[16]. 

The following results consider the 18-dimensional bound-constrained black- 
box function "TandEM-Atlas501" , that dchnes an interplanetary trajectory to 
Saturn from the Earth with multiple fly-bys, launched by the rocket Atlas 501. 
The final goal is to maximize the mass f(x), which can be delivered to Saturn 
using one of 24 possible fly-by sequences with possible maneuvers around Venus, 
Mars and Jupiter. 

The first best results was found for a sequence Earth- Venus-Earth-Earth- 
Saturn (f max = 1533.45) in 2008 by B. Addis ct al. [1]. The best results so far 
(fmax — 1673.88) was found in 2011 by G. Stracquadanio et al. [15]. 

All versions of CMA-ES with restarts have been launched with a maximum 
budget of 10 s function evaluations. All variables are normalized in the range 
[0, 1]. In the case of sampling outside of boundaries, the fitness is penalized and 
becomes f(x) = f(xf eas ibie) - a \\x ~ Xf eas ibie || 2 , where x feasible is the closest 
feasible point from point x and a is a penalty factor, which was arbitrarily set 
to 1000. 

As shown on Fig. 3, the new restart strategies NIPOP-aCMA-ES and NBIPOP- 
aCMA-ES respectively improve on the former ones (IPOP-aCMA-ES and BIPOP- 




Number of function evaluations Number of function evaluations 

Fig. 3. Comparison of all CMA-ES restart strategies on the Tandem fitness function 
(mass): median (left) and best (right) values out of 30 runs. 



aCMA-ES); further, NIPOP-aCMA-ES reaches same performances as BIPOP- 
aCMA-ES. 

The best solution found by NBIPOP-aCMA-ES 4 improves on the best so- 
lution found in 2008, while it is worse than the current best solution, which is 
blamed on the lack of problem specific heuristics [1, 15], on the possibly insuffi- 
cient time budget (10 s fitness evaluations), and also on the lack of appropriate 
constraint handling heuristics. 

5 Conclusion and Perspectives 

This paper contribution regards two new restart strategies for CMA-ES. NIPOP- 
aCMA-ES is a deterministic strategy simultaneously increasing the population 
size and decreasing the initial step-size of the Gaussian mutation. NBIPOP- 
aCMA-ES implements a competition between NIPOP-aCMA-ES and a random 
sampling of the initial mutation step-size, adaptively adjusting the computa- 
tional budget of each one depending on their current best results. Besides the 
extensive validation of NIPOP-aCMA-ES and NBIPOP-aCMA-ES on the BBOB 
benchmark, the generality of these strategies has been tested on a new problem, 
related to interplanetary spacecraft trajectory planning. 

The main limitation of the proposed restart strategies is to quasi implement 
a deterministic trajectory in the 9 space. Further work will consider h(Q) as yet 
another expensive noisy black-box function, and the use of a CMA-ES in the 
hyper-parameter space will be studied. The critical issue is naturally to keep 



4 x =[0.83521, 0.45092, 0.50284, 0.65291, 0.61389, 0.75773, 0.43376, 1, 0.89512, 
0.77264, 0.11229, 0.20774, 0.018255, 6.2057e-09, 4.0371e-08, 0.2028, 0.36272, 
0.32442]; fitness(x) = mass(x) = 1546.5 



the overall number of fitness evaluations beyond reasonable limits. A surrogate- 
based approach will be investigated [14], learning and exploiting an estimate of 
the (noisy and stochastic) h{9) function. 
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Fig. 4. Bootstrapped empirical cumulative distribution of the number of objective 
function evaluations divided by dimension (FEvals/D) for 50 targets in lo' -8 " 2 ' for all 
functions and weakly structured multi-modal subgroup in 40-D. The "best 2009" line 
corresponds to the best ERT observed during BBOB 2009 for each single target. 
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Table 1. Overall results on multi-modal functions /3 — 4 and /15 — 24 in dimension 
d = 40: Expected running time (ERT in number of function evaluations) divided by 
the respective best ERT measured during BBOB-2009 for precision Af ranging in 10* , 



i — 1 ... — 7. The median number of conducted function evaluations is additionally 
given in italics, if ERT(10~ 7 ) = oo. #succ is the number of trials that reached the final 
target / opt + 10 -8 . Best results are printed in bold. For a more detailed (statistical) 
analysis of results on BBOB problems, please see [13]. Statistically significantly better 
entries (Wilcoxon rank-sum test with p = 0.05) are indicated in bold. The interested 
reader is referred to [13] for the statistical analysis and discussion of these results. 
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